Skip to content

Conversation

@zachschuermann
Copy link
Member

@zachschuermann zachschuermann commented Oct 7, 2025

What changes are proposed in this pull request?

Catalog-managed writes POC. Not intended to merge, just e2e derisking. Major pieces:

  • copy engine API (with default engine impl)
  • publish APIs
  • LogSegment add latest_published_commit -> how to expose?
  • UC client changes: commit API
  • UC catalog changes: (1) UCCommitter (2) e2e example

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Oct 7, 2025
@nicklan nicklan self-requested a review October 14, 2025 23:20
zachschuermann added a commit that referenced this pull request Oct 22, 2025
## What changes are proposed in this pull request?
Adds a new required method: `copy_atomic(&self, src: &Url, dest: &Url)
-> DeltaResult<()>` to `StorageHandler`. This PR also adds support for
the default engine via the (dumb) way of GET/PUT. Note that I've elected
to pursue the simple/correct thing here and we can attempt to optimize
in the future (and can open a follow-up if others agree).

~This implementation proposes a slight departure from existing `Engine`
APIs: instead of returning a `DeltaResult<()>` we return `Result<(),
CopyError>` with CopyError defined as:~
<details>
  <summary>old pieces on CopyError omitted</summary>

```rust
#[derive(thiserror::Error, Debug)]
pub enum CopyError {
    #[error("Destination file already exists: {0}")]
    DestinationAlreadyExists(String),
    #[error(transparent)]
    Other(#[from] Box<dyn std::error::Error + Send + Sync>),
}
```
It captures the only things we care about from the `copy` API
perspective: either the destination already exists and we can return a
nice error message to the user saying their commit has already been
published (considering publishing is the main use case of this API for
now) _or_ we just got back some other random error which we don't really
care what it is, but rather just something we can surface to the user
and fail the overall publish API.

I've used this PR as an opportunity to introduce an Engine API more
aligned with our pursuit of finer-grainer errors (especially for Engine
trait) but happy to split out if we think it's better to just retain
existing `DeltaResult` pattern.
</details>

### Motivation

This PR will be used for commit publishing - basically copying commits
from staged commits to published commits. See #1377 for some context on
future usage.

### This PR affects the following public APIs

New required method in `StorageHandler` trait: `copy_atomic`


## How was this change tested?
new UT for default engine impl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant